NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Sun, Fan-Yun; Liu, Weiyu; Gu, Siyu; Lim, Dylan; Bhat, Goutam; Tombari, Federico; Li, Manling; Haber, Nick; Wu, Jiajun (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Free, publicly-accessible full text available June 5, 2026
Quaternion Equivariant Capsule Networks for 3D Point Clouds

Zhao, Yongheng; Birdal, Tolga; Lenssen, Jan Eric; Menegatti, Emanuele; Guibas, Leonidas; Tombari, Federico (November 2020, European Conference on Computer Vision)
null (Ed.)
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.
more » « less
Full Text Available
Quaternion Equivariant Capsule Networks for 3D Point Clouds

https://doi.org/10.1007/978-3-030-58452-8_1

Zhao, Yongheng; Birdal, Tolga; Lenssen, Jan Eric; Menegatti, Emanuele; Guibas, Leonidas; Tombari, Federico (November 2020, European Conference on Computer Vision)
null (Ed.)
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.
more » « less
Full Text Available
Incremental scene understanding on dense SLAM

https://doi.org/10.1109/IROS.2016.7759111

Li, Chi; Xiao, Han; Tateno, Keisuke; Tombari, Federico; Navab, Nassir; Hager, Gregory D. (October 2016, IROS)

We present an architecture for online, incremental scene modeling which combines a SLAM-based scene understanding framework with semantic segmentation and object pose estimation. The core of this approach comprises a probabilistic inference scheme that predicts semantic labels for object hypotheses at each new frame. From these hypotheses, recognized scene structures are incrementally constructed and tracked. Semantic labels are inferred using a multi-domain convolutional architecture which operates on the image time series and which enables efficient propagation of features as well as robust model registration. To evaluate this architecture, we introduce a large-scale RGB-D dataset JHUSEQ-25 as a new benchmark for the sequence-based scene understanding in complex and densely cluttered scenes. This dataset contains 25 RGB-D video sequences with 100,000 labeled frames in total. We validate our method on this dataset and demonstrate improved performance of semantic segmentation and 6-DoF object pose estimation compared with methods based on the single view.
more » « less
Full Text Available

Search for: All records